Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verdi storage backup #6069

Merged
merged 63 commits into from
Apr 18, 2024
Merged

verdi storage backup #6069

merged 63 commits into from
Apr 18, 2024

Conversation

eimrek
Copy link
Member

@eimrek eimrek commented Jul 5, 2023

This PR adds functionality to back up an AiiDA profile.

@codecov
Copy link

codecov bot commented Jul 5, 2023

Codecov Report

Attention: Patch coverage is 86.27451% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 78.16%. Comparing base (e330004) to head (98f8456).
Report is 8 commits behind head on main.

Files Patch % Lines
src/aiida/storage/psql_dos/backend.py 81.40% 8 Missing ⚠️
src/aiida/orm/implementation/storage_backend.py 89.48% 4 Missing ⚠️
src/aiida/cmdline/commands/cmd_storage.py 93.34% 1 Missing ⚠️
src/aiida/storage/sqlite_dos/backend.py 66.67% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6069      +/-   ##
==========================================
+ Coverage   77.81%   78.16%   +0.35%     
==========================================
  Files         550      558       +8     
  Lines       40720    42009    +1289     
==========================================
+ Hits        31684    32833    +1149     
- Misses       9036     9176     +140     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sphuber
Copy link
Contributor

sphuber commented Jul 6, 2023

Thanks a lot @eimrek . It would be super useful indeed if we can have an easy way to backup a profiles data storage (and restore it), so appreciate you looking into this.

Where is the correct location?;

If it is necessary to have a standalone bash script, I think it would make most sense to put it in the docs directory, and add a section (if it doesn't already exist) in the docs where it explains how to use it and there include an HTML link that allows the user to download a copy of the script. There are already examples in the docs of how you can add such a download link to serve a file.

Does calling the separate check_if_profile_locked.py make sense?;

Not sure. I don't think it is actually being used at all in the current state of this PR, is it? Is your idea that this backup should be able to be run even when the profile is in use? Or should the user make sure nothing is using the profile during the backup process. I would personally definitely favor the latter option. Just as the verdi storage maintain --full we just enforce it is not in use by adding this check. If the check is needed, ideally we don't have an ad-hoc separate python script for this but build it into verdi somehow.

Then a few comments about the current design:

  • We have been working on removing any code in the Python API and the verdi CLI that hardcodes the implementation of the psql_dos storage backend. It is now possible to have profiles with different storage backends and so hard-coding these commands to the psql_dos is not what we want. Good example of a command that still does this is verdi profile delete. This should really be refactored where the logic of the deletion is implemented on the PsqlDosStorage class and verdi profile delete just calls to it. Similarly therefore, I don't think it is a good idea to add verdi profile dbdump and hardcode it to psql_dos. I am wondering if instead we should define the backup abstract method on the StorageBackend class and have PsqlDosBackend implement this. I also don't see why this should do only the database and not also the repository.
  • Why is the backing up of the disk-objectstore container done in so many individual steps in the backup script? Why doesn't it just do an rsync -azh --delete? Is this because you want to make it possible to have the backup script run on a profile that is actively being used?

My first gut-instinct would be to implement something like the following:

# Add abstract method to the base storage class

class StorageBackend:

    @abstractmethod
    def backup(self, **kwargs):
        """Create a backup of the storage contents."""

# which is then implemented in subclasses such as the `psql_dos` backend

class PsqlDosBackend(StorageBackend):

    def backup(self, filepath_backup: pathlib.Path):
        """Create a backup of the postgres database and disk-objectstore to the provided path."""
        # Here you can now use the Python API to get access to connection details of the postgres
        # database and location of the disk-object store. Simplest is then to use `subprocess` to
        # call `pg_dump` and `rsync`.
        cfg = self._profile.storage_config

        env = os.environ.copy()
        env['PGPASSWORD'] = cfg['database_password']

        try:
            subprocess.run(
                ['pg_dump', '-h', cfg['database_hostname'], '-f', str(filepath_backup / 'db.psql'), ...],
                check=True,
                env=env
            )
        except subprocess.CalledProcessError:
            # Handle it
        
        filepath_container = get_filepath_container(self._profile)

        try:
            subprocess.run(['rsync', '-azh', str(filepath_container), str(filepath_backup)], check=True])
        except subprocess.CalledProcessError:
            # Handle it

The advantage of this approach would be:

  • Not adding any verdi commands that are hardcoded to work only for the psql_dos storage but they can also work for any other storage implementation
  • No need for separate scripts that need to be maintained and downloaded by the user
  • Can make a generic verdi storage backup command that works for all profiles and storage backends. This command can make use of the built-in profile locking and checks, just like verdi storage maintain does, so no need for stand-alone script for this

Do you see any fundamental problems with such an approach? Is there anything critical in the bash script you wrote that could not be implemented in Python on the PsqlDosBackend class?

@eimrek
Copy link
Member Author

eimrek commented Jul 6, 2023

Thanks a lot @sphuber for the comments. What you say makes sense. Initially the idea was that a bash script is preferential because this allows the users to see the logic and potentially modify it to their use case, but probably this is not needed very often. We discussed this and probably implementing this directly in python, as you suggest, is better. I'll try to adapt.

Regarding

Why is the backing up of the disk-objectstore container done in so many individual steps in the backup script? Why doesn't it just do an rsync -azh --delete? Is this because you want to make it possible to have the backup script run on a profile that is actively being used?

The idea is that if the users is adding nodes at the same time as the backup is running, this is the safest order to back up the DOS. (1. loose files; 2. sqlite db; 3. packed files).

@sphuber
Copy link
Contributor

sphuber commented Jul 6, 2023

The idea is that if the users is adding nodes at the same time as the backup is running, this is the safest order to back up the DOS. (1. loose files; 2. sqlite db; 3. packed files).

I see. I would personally just opt to say that we don't support using the profile during backup. We do the same for the maintenance operation. Now I can see the argument that technically there, at least for the full maintenance, concurrency is impossible, and for backing up it is possible in the way you describe, but it does add additional complexity and potential sources of problems. Would it not be easier to just say to users that for backing up the storage the profile should not be used? I think this is reasonable, or do you see a fundamental problem with this where people might need/want to backup so often that stopping using the profile is too disruptive?

And if it is necessary to be able to do a live backup, maybe this logic can be implemented in the disk-objectstore directly? If it is this specific, maybe it would make sense to have some API there, either in the Python API or through the CLI (I think it has a dostore CLI endpoint)? If ever its data structures or behavior changes, then the backup script can be updated in a coherent manner.

@giovannipizzi
Copy link
Member

@sphuber thanks for the comments - I agree on the suggestions on top, at the beginning the idea is that it was complex to make it general enough and it was better to generate a script that people could change. But in the end, it's probably possible so we should just have a command in verdi that does everything. This also indeed would allow to have the functionality to avoid concurrent maintenance operations by just using the existing functionality in AiiDA. I agree doing it on the backend.

However, I think it's essential that people will be able to backup while they are using AiiDA. And this works fast for incremental backups. If we ask people to stop their work to backup, they will essentially never do it. It should be something that they can put in a cron job and forget about it.

We could do some thing in disk-objectstore, and I see the points, but this will not allow to easily reuse the logic to lock the profile etc, and anyway we want to backup both the DB. Anyway in AiiDA it will be specific to the backend. I suggest to go ahead at least to converge to a version we are happy with. We can always move this to disk-objecstore once we see the final implementation, either before merging this PR, or even later in the future (aiidateam/disk-objectstore#20)

@sphuber
Copy link
Contributor

sphuber commented Jul 7, 2023

However, I think it's essential that people will be able to backup while they are using AiiDA. And this works fast for incremental backups. If we ask people to stop their work to backup, they will essentially never do it. It should be something that they can put in a cron job and forget about it.

Fair enough. If you think we can guarantee to do this safely, then I'm fine with doing it that way.

We could do some thing in disk-objectstore, and I see the points, but this will not allow to easily reuse the logic to lock the profile etc, and anyway we want to backup both the DB. Anyway in AiiDA it will be specific to the backend.

I don't understand how this follows. If you implement the functionality in the disk-objectstore to backup a container, either through Python API and/or its CLI, the AiiDA storage backend can just call this. So from AiiDA's perspective, it will be behind the profile locking functionality. Of course when users would go straight to the disk-objectstore API they could circumvent this, but in the disk-objectstore the concept of a profile doesn't exist anyway so there is nothing to lock.

So if we are anyway writing the logic to create a container backup, then I don't see a reason not to directly put it in the disk-objectstore package and have aiida-core call it. Since we control this package, once @eimrek has implemented the logic in this PR and we have verified it works, it is trivial to move it to disk-objectstore, make a release and then use it here.

@eimrek eimrek changed the title Profile backup script verdi storage backup Aug 3, 2023
@eimrek
Copy link
Member Author

eimrek commented Aug 3, 2023

I had some time to work on this. Still keeping this a draft, as there is some reorganization left to be done. The two main methods added (both in aiida/storage/psql_dos/backend.py) are:

  1. backup(path, remote, prev_backup)
    This backs up the storage data to path with rsync and hard links against prev_backup.
  2. backup_auto(path, remote)
    This is higher level, what we discussed with @giovannipizzi and @superstar54, where the user doesn't have to manage prev_backup themselves, and the backup is done in a safe manner to path, replacing the old backup upon completion. This is currently exposed via the verdi storage backup.

Script 2) probably shouldn't be in aiida/storage/psql_dos/backend.py, as it would work with any storage backend.

Another option is to instead exponse script 1) via verdi storage backup and remove script 2) from aiida-core. This would be more flexible, but it might be less convenient for users.

@sphuber, if you have a moment, any suggestions welcome.

@eimrek eimrek requested a review from sphuber August 3, 2023 17:04
@eimrek eimrek marked this pull request as ready for review September 7, 2023 18:01
@eimrek
Copy link
Member Author

eimrek commented Sep 7, 2023

Reorganized a bit: there are some command line commands that i'm calling (including rsync) that don't really need to be in the PsqlDosBackend class, so i separated them into a module: backup_utils. Completely fine to put somewhere else.

One tiny usablity doubt for the current implementation is that without specifying prev_backup, the backup will be made to a subfolder of the specified --path: <path>/last-backup. But if the user does specify prev_backup, it is created directly in path. I think this is explained adequately in the --help. The reasoning is that in the default case, i automatically manage the "previous" and "live" backups and this should be done via subfolders. When user specifies prev_backup, then making subfolders doesn't really make sense. Any suggestions/feedback welcome though.

Pinging who were interested (a quick test would be useful): @unkcpz @superstar54 @edan-bainglass

@superstar54
Copy link
Member

@eimrek I did a test in AiiDAlab docker container. It works. I share the command here:

verdi storage backup --pg_dump_exec /opt/conda/envs/aiida-core-services/bin/pg_dump --path /home/jovyan/test/aiida/backup

When I use the remote folder, it requires me to input the password more than ten times. Is it possible to reduce this? or maybe print out a comment before asking password, otherwise, the user may think the password is wrong.

@edan-bainglass
Copy link
Member

From an aiidalab container running on my Windows machine, I get the following:

(base) (backup-script) aiida-core > verdi storage backup --pg_dump_exec /opt/conda/envs/aiida-core-services/bin/pg_dump --path C:\\Users\\edanb\\Aurora\\backup --remote [email protected]
Warning: You are currently using a post release development version of AiiDA: 2.4.0.post0
Warning: Be aware that this is not recommended for production and is not officially supported.
Warning: Databases used with this version may not be compatible with future releases of AiiDA
Warning: as you might not be able to automatically migrate your data.

[email protected]'s password: 
[email protected]'s password: 
Report: Remote '[email protected]' is accessible!
[email protected]'s password: 
[email protected]'s password: 
[email protected]'s password: 
Error: Command '['ssh', '[email protected]', 'mkdir', 'C:\\Users\\edanb\\Aurora\\backup/live_backup']' returned non-zero exit status 1.
Error: Couldn't access/create 'C:\Users\edanb\Aurora\backup/live_backup'!

Not sure at the moment what might be the issue here

@edan-bainglass
Copy link
Member

Okay, one immediate issue is that the path needs to be quote-escaped, so \"C:\.......\". This worked, but than I got

Error: Command '['rsync', '-azh', '-vv', '--no-whole-file', '/tmp/tmprfyv4y7t/db.psql', '[email protected]:"C:/Users/edanb/Aurora/backup"/live_backup/']' returned non-zero exit status 12.

Will keep at it later :)

@eimrek
Copy link
Member Author

eimrek commented Oct 9, 2023

thanks a lot for the checks @edan-bainglass.

Do i understand correctly that you want to back up aiida from a linux docker container to a windows host? I am not sure if this will work, but it would be very useful to try.

if a python subprocess command fails, you can try to run it manually and see if a modification would make it work, e.g. the last one would be

rsync -azh -vv --no-whole-file /tmp/tmprfyv4y7t/db.psql [email protected]:"C:/Users/edanb/Aurora/backup"/live_backup/

@edan-bainglass
Copy link
Member

Yep. That's exactly what I'm doing. Will have to resume tomorrow, but I think it'll work once enough quotes have been escaped 😅 Will report back soon 👍

@edan-bainglass
Copy link
Member

edan-bainglass commented Oct 10, 2023

This

rsync -azh -vv --no-whole-file /tmp/tmprfyv4y7t/db.psql [email protected]:"C:/Users/edanb/Aurora/backup"/live_backup/

yields

'rsync' is not recognized as an internal or external command,
operable program or batch file.
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(231) [sender=3.2.7]

Some articles online suggest rsync is required also on Windows end. One person got it to work by installing rsync via cygwin on the Windows side. If this works, I would need to discuss with Empa IT installing this on the Empa workstation.

Another solution is to mount from windows a shared folder onto the linux server (container?) and rsync to the shared folder. If this works, I would need to likely write a script to rsync to the shared folder, ssh to the Empa workstation, move the backup from the shared folder over to the dedicated project drive, and clear out the shared folder. Not sure if verdi storage backup would have any issues here.

Will local test to see if I can get it to work. Not sure which, if any, would be applicable at Empa though 🤔

@edan-bainglass
Copy link
Member

Some articles online suggest rsync is required also on Windows end. One person got it to work by installing rsync via cygwin on the Windows side. If this works, I would need to discuss with Empa IT installing this on the Empa workstation.

Installed cygwin on my machine, then rsync on it, then exposed cygwin64/bin to path. rsync is now usable on Windows.

verdi storage backup ... still gave issues, so I ran the rsync part manually. One immediate issue is the --path to the backup folder. cygwin starts from the user's dir, so asking for C:/Users/edanb/... yields cygdrive/c/Users/edanb/C:/Users/edanb... which is obviously wrong. For now, removing C:/Users/edanb from the --path argument circumvents this issue. However, this runs into the present issue:

rsync -azh -vv --no-whole-file /tmp/tmpdn4ti05n/db.psql [email protected]:"Aurora/backup"/li
ve_backup/
opening connection using: ssh -l edanb host.docker.internal rsync --server -vvlogDtprze.iLsfxCIvu . Aurora/backup/live_backup/  (9 args)
[email protected]'s password: 
sending incremental file list
rsync: [sender] change_dir "/tmp/tmpdn4ti05n" failed: No such file or directory (2)
delta-transmission enabled
total: matches=0  hash_hits=0  false_alarms=0 data=0

sent 19 bytes  received 43 bytes  13.78 bytes/sec
total size is 0  speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1338) [sender=3.2.7]

Any thoughts on change_dir "/tmp/tmpdn4ti05n" failed: No such file or directory (2)?

@eimrek
Copy link
Member Author

eimrek commented Oct 10, 2023

thanks @edan-bainglass. Sorry my previous comment was a bit misleading. I'm dumping the postgresql db to a temporary folder in python and then rsyncing that. I didn't realize that this won't be available when you run it manually. You can perhaps run tests with other files. In the end... the python code just uses subprocess to call 1) bash commands on the remote server, for checking input validity and making folders and such; 2) calling rsync. If both of these work manually, we can adapt the python subprocess commands accordingly.

@edan-bainglass
Copy link
Member

edan-bainglass commented Oct 11, 2023

@eimrek "...the python code just uses subprocess to..." - regarding 1, you say it uses bash to execute commands on the host server. Well, the host is Windows, so no bash. However, mkdir exists, so at least the folder is created, with the caveat that the path must be in quotes. As for 2, see my previous comment regarding getting rsync to work for windows. rsync did work manually on a random file. Moved it from my container to the windows drive. But I still can't get it to work as part of verdi storage backup. I'll try again later today. I'll also check the mount approach, which would circumvent the cross-platform rsyncing issue.

@edan-bainglass
Copy link
Member

edan-bainglass commented Nov 3, 2023

Another solution is to mount from windows a shared folder onto the linux server (container?) and rsync to the shared folder. If this works, I would need to likely write a script to rsync to the shared folder, ssh to the Empa workstation, move the backup from the shared folder over to the dedicated project drive, and clear out the shared folder. Not sure if verdi storage backup would have any issues here.

Tried this ☝️

Steps:

  1. Start a container from Windows Docker Desktop with a mounted Windows drive (non-WSL) pointing to the container's /home/jovyab/backup
  2. Install dependencies
    • Clone aiida-core -> checkout backup-script -> install
    • Clone disk-objectstore -> checkout backup-cmd -> install
  3. Install rsync on container
    • Had to connect to the container as root to do this (only for this step - remaining steps executed as jovyan)
    • See issue #411 on aiidalab-docker-stack
  4. Run verdi storage backup --pg_dump_exe /opt/conda/envs/aiida-core-services/bin/pg_dump /home/jovyan/backup

Results:

  • Backup produced no issues 🎉
  • Backup accessible via Windows 🎉🎉
  • Follow-up backup uses previous backup correctly to facilitate incremental backups 🎉🎉🎉

@eimrek not sure what the issue was in your case that raised symlink issues 🤷🏻‍♂️

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the PR on the disk-objectstore is merged, can you please update this PR @eimrek . I have already reviewed it. Besides those comments, you should be wary that all source files have moved to src/aiida. When you rebase, git should be smart enough to automatically move things, but new files will have to be moved manually

aiida/cmdline/commands/cmd_storage.py Outdated Show resolved Hide resolved
aiida/cmdline/commands/cmd_storage.py Outdated Show resolved Hide resolved
aiida/cmdline/commands/cmd_storage.py Outdated Show resolved Hide resolved
aiida/cmdline/commands/cmd_storage.py Outdated Show resolved Hide resolved
aiida/cmdline/commands/cmd_storage.py Outdated Show resolved Hide resolved
aiida/orm/implementation/storage_backend.py Outdated Show resolved Hide resolved
aiida/storage/psql_dos/backend.py Outdated Show resolved Hide resolved
aiida/storage/psql_dos/backend.py Outdated Show resolved Hide resolved
aiida/storage/psql_dos/backend.py Outdated Show resolved Hide resolved
aiida/storage/psql_dos/backend.py Outdated Show resolved Hide resolved
@eimrek
Copy link
Member Author

eimrek commented Jan 12, 2024

Thanks @sphuber for the review. I accidentally did the rebase before implementing your changes, so i couldn't directly commit via github, but i implemented everything manually. The functionality seems to work. Currently there are no tests for this is aiida-core, but at least some basic ones should be added, right?

@sphuber
Copy link
Contributor

sphuber commented Jan 12, 2024

Thanks @sphuber for the review. I accidentally did the rebase before implementing your changes, so i couldn't directly commit via github, but i implemented everything manually. The functionality seems to work. Currently there are no tests for this is aiida-core, but at least some basic ones should be added, right?

Thanks @eimrek . I agree, we should definitely have some tests. I would add a test for the PsqlDosStorage.backup command directly testing the backup actually works. And then the CLI tests can be very light, just checking the parameters work (that reminds me, it might be best to add as much validation in the CLI itself, for example that keep is a positive integer greater than 1). Don't have to check the "correctness" of the actual backup again as that is already done elsewhere.

@eimrek
Copy link
Member Author

eimrek commented Jan 12, 2024

One thing that might be good is to check if the profile names match for a backup and any existing backups in the destination.

@sphuber
Copy link
Contributor

sphuber commented Apr 11, 2024

Just one minor convention thing: For the dumping, I report paths enclosed by "path", not path - I thought this is how it is usually done? Just to make sure.

As you say, this is merely convention, and the current convention in verdi is to use backticks.

@khsrali
Copy link
Contributor

khsrali commented Apr 11, 2024

I had a look at the implementation and the documentation, all good!

@eimrek
Copy link
Member Author

eimrek commented Apr 11, 2024

Just to prevent this from being merged now: as @mbercx reported to me, there is a problem with the documentation. Namely, it currently assumes that the config.json is backed up as well and that acts as a reference for the user to restore the corresponding section in their .aiida/config.json. I am leaning towards keeping the config.json again in the subfolders. That would possibly also help with a potential verdi profile restore command implementation. I think @mbercx will write more details about this.

@mbercx
Copy link
Member

mbercx commented Apr 11, 2024

A few notes after testing for a small and large database, followed up by a discussion with @eimrek:

A. Large Databases

Tested for a larger database with ~1 million nodes and 100GB loose packed files:

  • Initial maintenance and backup took 46 mins.
  • Immediately trying the backup afterwards took 13s.
  • Adding 100 new Int nodes and backing up took about the same (12s).
  • Adding 1000 new SinglefileData nodes and backup up took a little longer (20s).

So this all works very well! Great work 👍

B. Verbosity

I initially didn't like the amount of output, since the INFO verbosity also gives quite a bit of rsync content. Apparently this depends on the rsync version, and updating to v3 does improve things. Making REPORT the default is cleaner, but shows less progress which is problematic for large databases. So I'm fine keeping the default as is.

C. Restoring the profile

The main challenge at the moment is restoring the profile after backing it up. Already discussed the current instructions with @eimrek and he'll update a bit. It's clear that manually adapting the config.json and then reinstating the database and properly rsync-ing the repository is something most users will struggle with, and so a verdi profile restore command would be very useful in a future PR/release. For this PR, it would already be good to consider how we want this restore command to function and make sure we back up everything we need, both for the current manual restore and a future restore command. Basically, I'm going to try and consider if the config.json content of the corresponding profile should be added as well.

Let's say I've done two backups for a certain profile, and so my backup folder looks like this:

❯ tree -L 2
.
├── aiida-backup.json
├── backup_20240411133917_u19p
│   ├── container
│   └── db.psql
├── backup_20240411134236_fut0
│   ├── container
│   └── db.psql
└── last-backup -> backup_20240411134236_fut0

I've done this backup with the (very nice) SSH feature, and now want to restore after I've e.g. messed up my postgreSQL database/install or my hard drive got corrupted. I've executed all the required setup steps up to the point that I have AiiDA and services installed but no profile.

Manual restore

To manually restore the backed up profile (as described here), I'd have to:

  1. first manually adapt the .aiida/config.json file.
  2. Reinstate the PostgreSQL database from the dump to the database specified in the config.json file.
  3. rsync the repository to the location specified in the config.json file.

I also tried creating a new profile and somehow use the backed up data, but unfortunately this fails no matter if you first create the database or first create the profile (well, it throws a bunch of errors, but it does seem - at first glance - to work afterwards, at least when you create a profile based on an already existing database/repository). So adapting the config.json manually seems to be the only "clean" current approach for restoring a profile from the backup.

Here, having the config.json information from the backed up profile would probably be useful, if only to have the structure to copy paste and adapt where necessary. Still, it would be much better to have a dedicated command.

Using verdi profile restore

Ideally, if I wanted to reinstate my backed up profile, I would just be able to run:

verdi profile restore <PATH_TO_BACKUP>

And it would take the information from the backed up config.json where possible. Looking at the content:

  • The storage.backend, PROFILE_UUID and name of the profile are already currently stored in the aiida-backup.json file.
  • The process_control settings could in principle be different, but if I'm trying to reinstate a backed up profile it's likely these will be the same.
  • If I prefer certain options, I'll likely want to keep these as well. Similar for the default_user_email.
  • The settings that are most likely to require changes are the storage.config ones:
    • database_engine/database_hostname/database_port: I think for the vast majority of users these will not change (I might be myopic here).
    • database_name: We can try to recreate this database, and request a new database name in case it already exists.
    • database_username: Check if the user exists, and request the new username in case it doesn't.
    • database_password: Perhaps this shouldn't be stored in the backup... Admittedly we already store the password in the config.json file by default, so clearly security isn't our strong suit here. But the config.json could be in a much more secure location than the backup. Probably the password should be requested during the execution of the restore command.
    • repository_uri: I would still use this, asking the user if the repository should be in the same location as specified here, and prompt them for a new location otherwise.

My message is already getting quite lengthy, and is mostly previewing the verdi profile restore command, so I'll stop here. Pertinent for this PR is that perhaps we should also add the config.json content of a profile to the backup, since it can be useful for both manual and future automated restorations of the backup.

TL; DR

  • Backup API is good and performs well for large databases! 👌
  • I think it'd be a good idea to add the profile-specific config.json content to the backup to make future restorations easier.

@sphuber
Copy link
Contributor

sphuber commented Apr 12, 2024

Just had a discussion with @mbercx following his comment to store the full config.json for the particular profile that is being backed up.

TLDR: We think it is better to have verdi storage backup write the current state of the config.json to the root directory of the backup. This will be overridden each time a new backup is created. It would essentially replace the custom aiida-backup.json that is currently written.

This solution is what we originally implemented. After discussion Kristian and I decided to remove it since it wasn't clear what additional information the storage config would provide. Storing 1 copy per backup iteration wouldn't make sense, because if the storage config changed at some point, even when restoring the data, you most likely wouldn't want to revert to the old config. The alternative was to store a single copy in the root directory. This would however be overwritten each time, losing the config options of previous iterations. So the only additional valuable information that would be provided by backing up the entire config.json, which would be the config options, would be lost over time with only the last state being kept.

That being said though, the config options are not really coupled to the state of the storage and are essentially independent. So it is not really important to keep a copy of the state of the config options to each snapshot of the storage.

Finally, when we think of the various restore scenarios, there are essentially two:

  • A: The profile to be restored still exists in the current AiiDA instance and essentially user wants to roll-back the state of the data storage. In this case, the storage config can be taken from the global config.json and the version in the backup dir can be ignored.
  • B: The profile to be restored doesn't exist. Examples are of recreating a profile on a completely different machine after having copied over the backup dir. In this case, most likely the storage config is no longer relevant and the user would simply create a new profile with the same storage plugin and simply restore the data. But, as @mbercx pointed out, it could be that the system is almost the same (just the database got lost) and they want to reuse the same configuration. An automated verdi profile restore method could then make use of the backed up storage config to reuse it instead of prompting the user.

@eimrek Marnik told me that you had already discussed and you were also on board with this solution. If so, could you please essentially revert to our original solution of just writing the config.json with just the backed up profile to the root backup dir? Thanks a lot!

@unkcpz
Copy link
Member

unkcpz commented Apr 12, 2024

I did a test on restoring the DB and repository backup. Thing all went fine.
However it is not so trivial to do it. I have to go back and forth between different sections [1][2][3] in documentation to create the correct parameters for config.json and set the proper user/db name for the new target psql entry. I think it would be great to combine some important message related to restoring in one place.

Option 1: Repeat some information and put it to the https://aiida--6069.org.readthedocs.build/projects/aiida-core/en/6069/howto/installation.html#restoring-data-from-a-backup
Option 2: When running verdi storage backup, generate a README into the backup folder with instructions.

Any opinion on this?

Here are other minor mistakes I made but takes some effort for me to figure it out. Would be great to see it somewhere to avoid the problem.

  • The role (db user) and the database name are supposed to be the same as original DB, otherwise will lead to "ERROR: role "" does not exist" or "FATAL: database "" does not exist".
  • Restoring the file repository is straightforward, but I made a mistake to put the target path with file://<path>, only <path> is enough.

But anyhow, the backup works perfectly for me and also green light from my side. Cheers @eimrek

@eimrek
Copy link
Member Author

eimrek commented Apr 12, 2024

Thanks @mbercx @sphuber @unkcpz!

I will adapt the PR accordingly.

@eimrek eimrek requested a review from sphuber April 15, 2024 18:08
@sphuber
Copy link
Contributor

sphuber commented Apr 17, 2024

Thanks for reverting the change to the backup config file @eimrek . I had a look and it looks good. I did have another think about the is_backup_implemented property solution. I implemented a quick alternative that gets rid of it and simply cleans up if the plugin class raises NotImplementedError. See the commit here: sphuber@e0e5933

I think I would prefer this solution. Let me know what you think.

@eimrek
Copy link
Member Author

eimrek commented Apr 18, 2024

Thanks @sphuber, yep, that looks good! Although rm -rf is a bit scary, i don't see any ways it could go wrong in this case. feel free to merge.

EDIT: actually, this could be a potential problem: What if the user is doing regular backups for a profile to a folder with a specific version of aiida that implements the backup mechanism of the backend. And for whatever reason the user has to downgrade aiida version to a state where the specific backend does not implement the backup method. I think in this case, all of the backups would be deleted, no?

@sphuber
Copy link
Contributor

sphuber commented Apr 18, 2024

And for whatever reason the user has to downgrade aiida version to a state where the specific backend does not implement the backup method. I think in this case, all of the backups would be deleted, no?

Quite the edge-case, but fair enough, I think that would indeed be problematic. What about if we add a check making sure the directory is empty (excluding the config.json)?

@sphuber sphuber added the priority/critical-blocking must be resolved before next release label Apr 18, 2024
@eimrek
Copy link
Member Author

eimrek commented Apr 18, 2024

@sphuber yep, that should probably be fine.

@sphuber
Copy link
Contributor

sphuber commented Apr 18, 2024

@eimrek I noticed that in _validate_or_init_backup_folder you already create the config.json file. Why is this? It is also already written by the backup method, so why do we need it also in the initialization? I am asking because it is making the cleanup in case of a NotImplementedError more complex since I have to make an exception for config.json to determine whether the folder is empty

@eimrek
Copy link
Member Author

eimrek commented Apr 18, 2024

@sphuber The reasoning was that it's good to have the folder "reserved" for that single profile before the initial backup starts (that can take a long time). To prevent accidentally backing up a different profile in the same folder. Additionally, if the backup should be interrupted for whatever reason (and leaves a live-backup folder), it's good to know what profile it was and potentially resume the backup.

@eimrek
Copy link
Member Author

eimrek commented Apr 18, 2024

@sphuber. although, we could generate the config.json file in the backup method before doing the actual backup. Then, I think, it doesn't need to be in the _validate_or_init_backup_folder method any more and that method could just be called _validate_backup_folder. Would perhaps reduce the complexity a bit.

@sphuber
Copy link
Contributor

sphuber commented Apr 18, 2024

@sphuber. although, we could generate the config.json file in the backup method before doing the actual backup. Then, I think, it doesn't need to be in the _validate_or_init_backup_folder method any more and that method could just be called _validate_backup_folder. Would perhaps reduce the complexity a bit.

That wouldn't help unfortunately I think, because then the config.json would still be created before doing the actual backup which can then raise NotImplementedError, so we are back to the old situation.

if the backup should be interrupted for whatever reason (and leaves a live-backup folder), it's good to know what profile it was and potentially resume the backup.

I don't think this would help right? Is there support for resuming an interrupted backup?

To prevent accidentally backing up a different profile in the same folder.

This is a fair point. Again very much an edge-case, but could happen. Guess I will have to just perform the logic that excludes config.json from the contents check.

@eimrek
Copy link
Member Author

eimrek commented Apr 18, 2024

I don't think this would help right? Is there support for resuming an interrupted backup?

i think there implicitly is support for this. If a backup is interrupted, there is a partially completed live-backup folder. The next attempt will rsync to the same location and rsync will resume from the previous attempt. E.g. I just tried this by interrupting the backup after the sql database was rsynced. Speedup values "from-scratch" versus on top of live-backup are respectively 5,02 and 85,88.

That wouldn't help unfortunately I think, because then the config.json would still be created before doing the actual backup which can then raise NotImplementedError, so we are back to the old situation.

Agreed. I was just mentioning it as that would potentially make the code a bit simpler.

The `is_backup_implemented` property was added to the storage interface
such that the base class can call this before calling the actual storage
backend. The reason is that the creation of the output folder has to be
done before `_backup_backend` is called, but if that then raises
`NotImplementedError` the output directory would have been created for
nothing and remain empty.

The current solution adds an additional method to the abstract interface
called `is_backup_implemented`. Storage plugins should override this to
return `True` in addition to implementing `_backup_backend`. This feels
a bit redundant and can easily be forgotten by developers.

Here as an alternative, the `is_backup_implemented` is removed and the
base class `backend` will simply create the output folder and then
delete it if the plugin raises `NotImplementedError`.
…s-backend-implemented

Fix/backup script alternative is backend implemented
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I am pulling the trigger on this one. We can still test this before the release. Thanks a lot for all the work @eimrek !

@sphuber sphuber merged commit bf79f23 into main Apr 18, 2024
53 checks passed
@sphuber sphuber deleted the backup-script branch April 18, 2024 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/critical-blocking must be resolved before next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.